354 research outputs found

    Online variable-sized bin packing

    Get PDF
    AbstractThe classical bin packing problem is one of the best-known and most widely studied problems of combinatorial optimization. Efficient offline approximation algorithms have recently been designed and analyzed for the more general and realistic model in which bins of differing capacities are allowed (Friesen and Langston (1986)). In this paper, we consider fast online algorithms for this challenging model. Selecting either the smallest or the largest available bin size to begin a new bin as pieces arrive turns out to yield a tight worst-case ratio of 2. We devise a slightly more complicated scheme that uses the largest available bin size for small pieces, and selects bin sizes for large pieces based on a user-specified fill factor ʒā‰„12, and prove that this strategy guarantees a worst-case bound not exceeding 1.5+ʒ2

    Value of the analytic-deliberative framework in environmental decisionmaking: The case of Oklahoma water planning

    Get PDF
    Scope and Method of Study: This study investigates the relationship between the level (robustness) of public participation and the social acceptability of policy recommendations that emanate from environmental decision-making processes. In particular, the study investigates the ability of analytic-deliberative processes to increase social acceptability of watershed planning recommendations over those generated in processes that employ less robust participation processes such as consultation. Three statewide comprehensive water-planning efforts and two regional planning efforts in Oklahoma, USA, were included in this study. Robustness of participation processes were judged using Sherry Arnstein's ladder of public participation. Social acceptability of plan recommendations was judged by elites familiar with water planning and public preferences across the State. Their judgments were validated by comparison to large random sample surveys of citizens in two of the five planning efforts. Analysis of the relationship between participation robustness and recommendation acceptability was conducted for individual recommendations and categories of similar recommendations.Findings and Conclusions: Elite judgments compared favorably with those obtained directly from citizens in the two random sample telephone studies, which indicate that their judgments could be used as a proxy for public preferences - although elite judgments proved to be somewhat more pessimistic. A weak but statistically significant positive relationship between the level of public participation and recommendation acceptability was found. In particular, a positive relationship was found between plans that involved no public participation and those that involved at least some. The relationship was strongest for those recommendations that were more salient, controversial, and populist (seen by citizens as appropriate for public discourse). A model developed by Will Focht can be used to explain this relationship as the outcome of trust judgments: citizens prefer to be more involved when both government and social trust are lower and facts are less certain

    Graph algorithms for machine learning: a case-control study based on prostate cancer populations and high throughput transcriptomic data

    Get PDF
    Background The continuing proliferation of high-throughput biological data promises to revolutionize personalized medicine. Confirming the presence or absence of disease is an important goal. In this study, we seek to identify genes, gene products and biological pathways that are crucial to human health, with prostate cancer chosen as the target disease. Materials and methods Using case-control transcriptomic data, we devise a graph theoretical toolkit for this task. It employs both innovative algorithms and novel two-way correlations to pinpoint putative biomarkers that classify unknown samples as cancerous or normal. Results and conclusion Observed accuracy on real data suggests that we are able to achieve sensitivity of 92% and specificity of 91%

    EntropyExplorer: An R Package for Computing and Comparing Differential Shannon Entropy

    Get PDF
    Background: Differential Shannon entropy (DSE) and differential coefficient of variation (DCV) are effective metrics for the study of gene expression data. They can serve to augment differential expression (DE), and be applied in numerous settings whenever one seeks to measure differences in variability rather than mere differences in magnitude. A general purpose, easily accessible tool for DSE and DCV would help make these two metrics available to data scientists. Automated p value computations would additionally be useful, and are often easier to interpret than raw test statistic values alone. Results: EntropyExplorer is an R package for calculating DSE, DCV and DE. It also computes corresponding p values for each metric. All features are available through a single R function call. Based on extensive investigations in the literature, the Fligner-Killeen test was chosen to compute DCV p values. No standard method was found to be appropriate for DSE, and so permutation testing is used to calculate DSE p values. Conclusions: EntropyExplorer provides a convenient resource for calculating DSE, DCV, DE and associated p values. The package, along with its source code and reference manual, are freely available from the CRAN public repository at http://cran.r-project.org/web/packages/EntropyExplorer/index.html

    Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF

    Get PDF
    Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k

    A multifactorial obesity model developed from nationwide public health exposome data and modern computational analyses

    Get PDF
    Summary Statement of the problem Obesity is both multifactorial and multimodal, making it difficult to identify, unravel and distinguish causative and contributing factors. The lack of a clear model of aetiology hampers the design and evaluation of interventions to prevent and reduce obesity. Methods Using modern graph-theoretical algorithms, we are able to coalesce and analyse thousands of inter-dependent variables and interpret their putative relationships to obesity. Our modelling is different from traditional approaches; we make no a priori assumptions about the population, and model instead based on the actual characteristics of a population. Paracliques, noise-resistant collections of highly-correlated variables, are differentially distilled from data taken over counties associated with low versus high obesity rates. Factor analysis is then applied and a model is developed. Results and conclusions Latent variables concentrated around social deprivation, community infrastructure and climate, and especially heat stress were connected to obesity. Infrastructure, environment and community organisation differed in counties with low versus high obesity rates. Clear connections of community infrastructure with obesity in our results lead us to conclude that community level interventions are critical. This effort suggests that it might be useful to study and plan interventions around community organisation and structure, rather than just the individual, to combat the nationā€™s obesity epidemic

    Comparison of threshold selection methods for microarray gene co-expression matrices

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Network and clustering analyses of microarray co-expression correlation data often require application of a threshold to discard small correlations, thus reducing computational demands and decreasing the number of uninformative correlations. This study investigated threshold selection in the context of combinatorial network analysis of transcriptome data.</p> <p>Findings</p> <p>Six conceptually diverse methods - based on number of maximal cliques, correlation of control spots with expressed genes, top 1% of correlations, spectral graph clustering, Bonferroni correction of p-values, and statistical power - were used to estimate a correlation threshold for three time-series microarray datasets. The validity of thresholds was tested by comparison to thresholds derived from Gene Ontology information. Stability and reliability of the best methods were evaluated with block bootstrapping.</p> <p>Two threshold methods, number of maximal cliques and spectral graph, used information in the correlation matrix structure and performed well in terms of stability. Comparison to Gene Ontology found thresholds from number of maximal cliques extracted from a co-expression matrix were the most biologically valid. Approaches to improve both methods were suggested.</p> <p>Conclusion</p> <p>Threshold selection approaches based on network structure of gene relationships gave thresholds with greater relevance to curated biological relationships than approaches based on statistical pair-wise relationships.</p

    Computational, Integrative, and Comparative Methods for the Elucidation of Genetic Coexpression Networks

    Get PDF
    Gene expression microarray data can be used for the assembly of genetic coexpression network graphs. Using mRNA samples obtained from recombinant inbred Mus musculus strains, it is possible to integrate allelic variation with molecular and higher-order phenotypes. The depth of quantitative genetic analysis of microarray data can be vastly enhanced utilizing this mouse resource in combination with powerful computational algorithms, platforms, and data repositories. The resulting network graphs transect many levels of biological scale. This approach is illustrated with the extraction of cliques of putatively coregulated genes and their annotation using gene ontology analysis and cis-regulatory element discovery. The causal basis for coregulation is detected through the use of quantitative trait locus mapping
    • ā€¦
    corecore